-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): git-push step: pull --rebase before push #3119
Conversation
✅ Deploy Preview for docs-kargo-io ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3119 +/- ##
==========================================
+ Coverage 51.26% 51.27% +0.01%
==========================================
Files 283 285 +2
Lines 25563 25706 +143
==========================================
+ Hits 13104 13182 +78
- Misses 11762 11824 +62
- Partials 697 700 +3 ☔ View full report in Codecov by Sentry. |
9b3979c
to
0fac579
Compare
595305a
to
1815071
Compare
1815071
to
ce3b000
Compare
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
… errors/failures Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]> Co-authored-by: Hidde Beydals <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]>
3d4ccb1
to
efa806f
Compare
internal/directives/git_pusher.go
Outdated
// is indeed the maximum number of attempts. | ||
backoff.Steps = int(*cfg.MaxAttempts) | ||
} else { | ||
backoff.Steps = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default seems pretty high, I think something like 10
would probably be better. Unless you have a specific reason to opt for this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the following, and ~20 promos running concurrently and writing to the same branch, I was occasionally able to exhaust attempts:
wait.Backoff{
Duration: 1 * time.Second,
Factor: 2,
Steps: 10,
Cap: 2 * time.Minute,
Jitter: 0.1,
}
Being aware of how extensive some users' monorepos are, I presumed 10 to be not nearly enough.
The effective defaults are now:
wait.Backoff{
Steps: 50,
Duration: 10 * time.Millisecond,
Factor: 5.0,
Jitter: 0.1,
}
I think I still like defaulting to a reasonably high number of steps, but what I definitely didn't account for here is that retry.DefaultBackoff
has no cap -- which makes 50 steps absurd.
What do you think about leaving it 50, but capping it at something like 5s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I could use a mutex to make the pull + rebase + push a critical section of code.
Given that controllers can be sharded, it won't eliminate the need for retries entirely, but it would improve the situation considerably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using this default backoff because, iirc, you'd suggested it, but I'm rethinking that.
In general, the best way to avoid conflicts will be spacing out the retries of the multiple concurrent promos...
That's got me thinking about a larger starting interval and a larger jitter.
At the same time, increasing the interval by a factor of five at each step seems way too aggressive. Even two seems aggressive.
I'm contemplating something like this:
wait.Backoff{
Steps: 10,
Duration: time.Second,
Factor: 1.5,
Jitter: 0.5,
Cap: 30 * time.Second,
}
I'll try that in the morning to see how it performs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following works well on its own. It reduced the maximum number of retries I saw in my own example to 6 and most time, there was success with fewer retries, if any at all.
wait.Backoff{
Steps: 10,
Duration: time.Second,
Factor: 1.5,
Jitter: 0.5,
Cap: 30 * time.Second,
}
I also added a repo + branch lock, so retries should only ever be necessary in the event of sharded controllers concurrently pushing to the same branch.
With this added, I see no retries at all.
Both improvements are in 54b2a27.
Signed-off-by: Kent Rancourt <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]> Co-authored-by: Hidde Beydals <[email protected]> (cherry picked from commit 3223783)
Successfully created backport PR for |
Signed-off-by: Kent Rancourt <[email protected]> Co-authored-by: Hidde Beydals <[email protected]> Signed-off-by: Diego Caspi <[email protected]>
Signed-off-by: Kent Rancourt <[email protected]> Co-authored-by: Hidde Beydals <[email protected]>
Fixes #3071
Although this is WIP 👀 would still be helpful, @hiddeco. 🙏
Apart from the git bits of this, this PR is the catalyst for the step execution engine to start distinguishing between errors/failures that are terminal and ones that can be retried. This seemed like the time, because once a merge conflict that requires manual resolution has been detected, there is no point in performing any retries, no matter what the retry policy says.
I tried to implement what we discussed offline, but I think you'll have some good constructive feedback on this.
r4r